7 research outputs found

    On binaural spatialization and the use of GPGPU for audio processing

    Get PDF
    3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques. We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances. General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches. The second aspect is related to the implementation of an augmented reality system having in mind an “off the shelf” system available to most home computers without the need of specialized hardware. A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene. The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed. Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are propose

    Self-organizing the space of vocal imitations

    Get PDF
    The human voice is a powerful instrument for producing sound sketches. The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In this contribution, we report on our attempts at extracting the principal components from a database of 152 short excerpts of vocal imitations. We describe each excerpt by a set of statistical audio features and by a measure of similarity of the envelope to a small number of prototype envelopes. We apply k-means clustering on a space whose dimensionality has been reduced by singular value decomposition, and discuss how meaningful the resulting clusters are. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark for exploring the sound space

    Analyzing and organizing the sonic space of vocal imitation

    Get PDF
    The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In order to devise tools that facilitate sound design by vocal sketching we attempt at organizing a database of short excerpts of vocal imitations. By clustering the sound samples on a space whose dimensionality has been reduced to the two principal components, it is experimentally checked how meaningful the resulting clusters are for humans. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark in the exploration of the sound space, and vocal imitations may serve as proxies for synthetic sounds

    Further evidence of the contribution of the ear canal to directional hearing: design of a compensating filter

    Get PDF
    It has been proven, and it is well documented in literature, that the directional response in HRTFs comes largely from the effect of the pinnae. However, few studies have analysed the contribution given by the remaining part of the external ear, particularly the ear canal. This work investigates the directionally dependent response of the modelled ear canal of a dummy head, assuming that the behaviour of the external ear is sufficiently linear to be approximated by an LTI system. In order to extract the ear canal\u27s transfer function, two critical microphone placements (at the eardrum and at the beginning of the cavum conchae) have been used. The system has been evaluated in several positions, along the azimuth plane and at different degrees of elevation. The results point out a non-negligible directional dependence that is well within the normal hearing range; based on these findings, physical models of the ear canal have been analysed and evaluated. We have also considered the practical application to binaural listening, and the colouration originated by the superimposition of the contribution of two ear canals (the listener\u27s and the dummy head\u27s). A compensating FIR filter with arbitrary frequency response is discussed as a possible fix

    Sound and the City: Multi-Layer Representation and Navigation of Audio Scenarios

    Get PDF
    IEEE 1599-2008 is an XML-based standard originally intended for the multi-layer representation of music information. Nevertheless, it is versatile enough to describe also information different from traditional scores written according to the Common Western Notation (CWN) rules. This paper will discuss the application of IEEE 1599-2008 to the audio description of paths and scenarios from the urban life or other landscapes. The standard we adopt allows the multilayer integration of textual, symbolical, structural, graphical, audio and video contents within a unique synchronized environment. Besides, for each kind of media, a number of digital objects is supported. As a consequence, thanks to the features of the format the produced description will be more than a mere audio track, a slideshow made of sonified static images or a movie. Finally, an ad hoc evolution of a standard viewer for IEEE 1599 documents will be presented, in order to enjoy the results of our efforts

    PureMX: Automatic transcription of MIDI live music performances into XML format

    Get PDF
    This paper addresses the problem of the real-time automatic transcription of a live music performance into a symbolic format based on XML. The source data are given by any music instrument or other device able to communicate with Pure Data by MIDI. Pure Data is a free, multi-platform, real-time programming environment for graphical, audio, and video processing. During a performance, music events are parsed and their parameters are evaluated thanks to rhythm and pitch detection algorithms. The final step is the creation of a well-formed XML document, validated against the new international standard known as IEEE 1599. This work will shortly describe both the software environment and the XML format, but the main analysis will involve the realtime recognition of music events. Finally, a case study will be presented: PureMX, an application able to perform such an automatic transcription

    Head in space : A head-tracking based binaural spatialization system

    Get PDF
    This paper discusses a system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization. Head tracking is performed through face recognition algorithms which use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose an audio file to play, provide a virtual position for the source in an euclidean space, and then listen to the sound as if it is coming from that position. If they move their head, the signal provided by the system changes accordingly in real-time, thus providing a realistic effect
    corecore